GEM ML Framework Demonstrator - Deforestation Detection¶

In these notebooks, we provide an in-depth example of how the GEM ML framework can be used for segmenting deforested areas using Sentinel-2 imagery as input and the TMF dataset as a reference. The idea is to use a neural network (NN) model for the analysis. Thanks to the flexibility of the GEM ML framework, we can easily substitute the model in the future by adjusting only the configuration file. We will have a look at the following notebooks separately:

  • 00_Configuration
  • 01_DataAcquisition
  • 02_DataNormalization
  • 03_TrainingValidationTesting
  • 04_Inference_Clouds

Authors: Michael Engel (m.engel@tum.de) and Joana Reuss (joana.reuss@tum.de)


Training, Validation and Testing¶

In this notebook, we will train, validate and test the model of choice.

In [1]:
import os
import sys
import platform
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import time
import natsort

import torch
import torch.multiprocessing as mp
from tensorboardX import SummaryWriter
from tensorboard import notebook

from sentinelhub import SHConfig, BBox, CRS, DataCollection, UtmZoneSplitter, DataCollection
from eolearn.core import FeatureType, EOPatch, MergeEOPatchesTask, MapFeatureTask, MergeFeatureTask, ZipFeatureTask, LoadTask, EONode, EOWorkflow, EOExecutor, OverwritePermission, SaveTask
from eolearn.io import SentinelHubDemTask, ExportToTiffTask, SentinelHubInputTask, SentinelHubEvalscriptTask, get_available_timestamps, ImportFromTiffTask
from eolearn.mask import CloudMaskTask, JoinMasksTask
from eolearn.features.feature_manipulation import SpatialResizeTask
from eolearn.features.utils import ResizeMethod, ResizeLib

import rasterio
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon,Point
import folium
from folium import plugins as foliumplugins

from libs.ConfigME import Config, importME
from libs.MergeTDigests import mergeTDigests
from libs.QuantileScaler_eolearn import QuantileScaler_eolearn_tdigest
from libs.Dataset_eolearn import Dataset_eolearn
from libs import AugmentME
from libs import ExecuteME

from tasks.TDigestTask import TDigestTask
from tasks.PickIdxTask import PickIdxTask
from tasks.SaveValidTask import SaveValidTask
from tasks.PyTorchTasks import ModelForwardTask

from utils.rasterio_reproject import rasterio_reproject
from utils.transforms import batchify, predict, mover, Torchify
from utils.parse_time_interval_observations import parse_time_interval_observations

print("Working Directory:",os.getcwd())
print("Environment:",os.environ['CONDA_DEFAULT_ENV'])
print("Executable:",sys.executable)
/home/michael/anaconda3/envs/eolearn_water/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Incorporating libs!
Incorporating tasks!
Incorporating utils!
Working Directory: /home/michael/Documents/GEM/TUM-Git/eo-learn-examples/GEM-ML/Example_DeforestationDetection
Environment: eolearn_water
Executable: /home/michael/anaconda3/envs/eolearn_water/bin/python

Config¶

First, we load our configuration file which provides all information we need throughout the script and linuxify our paths (if you are working on a Windows machine) as the eo-learn filesystem manager does not support backslashes for now.

In [2]:
#%% load configuration file
config = Config.LOAD("config.dill")

#%% linuxify
config.linuxify()

Data Preparation¶

Dataloading¶

First, we need to get the paths for all samples within our training, validation and testing datasets, respectively.

In [3]:
#%% training samples
paths_train = [os.path.join(config["dir_train"],file).replace("\\","/") for file in os.listdir(config["dir_train"])]

#%% validation samples
paths_validation = [os.path.join(config["dir_validation"],file).replace("\\","/") for file in os.listdir(config["dir_validation"])]

#%% testing samples
paths_test = [os.path.join(config["dir_test"],file).replace("\\","/") for file in os.listdir(config["dir_test"])]

Quantile Scaler¶

As discussed in the third notebook, we want to apply quantile scaling to our data. We load the scaler, we've already defined in the previous notebook.

In [4]:
Scaler = QuantileScaler_eolearn_tdigest.LOAD(os.path.join(config["dir_results"],config["savename_scaler"]))

Now, we are ready to define our datasets using the Dataset_eolearn! Remember that PyTorch asks for shape [batch_size x channels x timestamps x height x width]. The QuantileScaler_eolearn_tdigest handles this by setting transform=Torchify(1). For the reference and the mask, we use the Torchify class provided within the Dataset_eolearn module.

In [5]:
#%% training dataset
dataset_train = Dataset_eolearn(
    paths = paths_train,
    feature_data = (FeatureType.DATA,"data"),
    feature_reference = (FeatureType.MASK_TIMELESS,"reference"),
    feature_mask = (FeatureType.MASK_TIMELESS,"mask_reference"),

    transform_data = Scaler,
    transform_reference = Torchify(1),
    transform_mask = Torchify(1),
    
    return_idx = True,
    return_path = False,

    torchdevice = None,
    torchtype_data = torch.FloatTensor,
    torchtype_reference = torch.LongTensor,
    torchtype_mask = torch.LongTensor,
)

#%% validation dataset
dataset_validation = Dataset_eolearn(
    paths = paths_validation,
    feature_data = (FeatureType.DATA,"data"),
    feature_reference = (FeatureType.MASK_TIMELESS,"reference"),
    feature_mask = (FeatureType.MASK_TIMELESS,"mask_reference"),

    transform_data = Scaler,
    transform_reference = Torchify(1),
    transform_mask = Torchify(1),
    
    return_idx = True,
    return_path = False,

    torchdevice = None,
    torchtype_data = torch.FloatTensor,
    torchtype_reference = torch.LongTensor,
    torchtype_mask = torch.LongTensor,
)

#%% testing dataset
dataset_test = Dataset_eolearn(
    paths = paths_test,
    feature_data = (FeatureType.DATA,"data"),
    feature_reference = (FeatureType.MASK_TIMELESS,"reference"),
    feature_mask = (FeatureType.MASK_TIMELESS,"mask_reference"),

    transform_data = Scaler,
    transform_reference = Torchify(1),
    transform_mask = Torchify(1),
    
    return_idx = True,
    return_path = False,

    torchdevice = None,
    torchtype_data = torch.FloatTensor,
    torchtype_reference = torch.LongTensor,
    torchtype_mask = torch.LongTensor,
)

Let's test our datasets!

In [6]:
sample_train = dataset_train[:config["batch_size"]]
print('Training Data Shape:',sample_train[0].shape)
print('Training Reference Shape:',sample_train[1].shape)
print('Training Mask Shape:',sample_train[2].shape)
print()

sample_validation = dataset_validation[:config["max_batch_size"]]
print('Validation Data Shape:',sample_validation[0].shape)
print('Validation Reference Shape:',sample_validation[1].shape)
print('Validation Mask Shape:',sample_validation[2].shape)
print()

sample_test = dataset_test[:config["max_batch_size"]]
print('Testing Data Shape:',sample_test[0].shape)
print('Testing Reference Shape:',sample_test[1].shape)
print('Testing Mask Shape:',sample_test[2].shape)
print()
Training Data Shape: torch.Size([12, 6, 256, 256])
Training Reference Shape: torch.Size([12, 256, 256])
Training Mask Shape: torch.Size([12, 256, 256])

Validation Data Shape: torch.Size([2, 6, 256, 256])
Validation Reference Shape: torch.Size([2, 256, 256])
Validation Mask Shape: torch.Size([2, 256, 256])

Testing Data Shape: torch.Size([2, 6, 256, 256])
Testing Reference Shape: torch.Size([2, 256, 256])
Testing Mask Shape: torch.Size([2, 256, 256])

Let's define our dataloader for each dataset. We will double our batch_size for validation and testing as no gradient calculation is needed here.

In [7]:
#%% training dataloader
dataloader_train = torch.utils.data.DataLoader(
    dataset = dataset_train,
    batch_size = config["batch_size"],
    shuffle = True,
    sampler = None,
    batch_sampler = None,
    num_workers = 0 if platform.system()=="Windows" else config["threads"],
    collate_fn = None,
    pin_memory = False,
    drop_last = True,
    timeout = 0,
    worker_init_fn = None,
    multiprocessing_context = None,
    generator = None
)

#%% validation dataloader
dataloader_validation = torch.utils.data.DataLoader(
    dataset = dataset_validation,
    batch_size = config["max_batch_size"]*2,
    shuffle = False,
    sampler = None,
    batch_sampler = None,
    num_workers = 0 if platform.system()=="Windows" else config["threads"],
    collate_fn = None,
    pin_memory = False,
    drop_last = True,
    timeout = 0,
    worker_init_fn = None,
    multiprocessing_context = None,
    generator = None
)

#%% testing dataloader
dataloader_test = torch.utils.data.DataLoader(
    dataset = dataset_test,
    batch_size = config["max_batch_size"]*2,
    shuffle = False,
    sampler = None,
    batch_sampler = None,
    num_workers = 0 if platform.system()=="Windows" else config["threads"],
    collate_fn = None,
    pin_memory = False,
    drop_last = True,
    timeout = 0,
    worker_init_fn = None,
    multiprocessing_context = None,
    generator = None
)

Model¶

It is time to initialize our model. To do so, we use the importME method. It allows us to stay flexible regarding the chosen model architecture and easily adapt it in the future.

In [8]:
#%% import model
module_model = importME(config["module_model"])

#%% initialize model
model = module_model(**config["kwargs_model"])

We want to augment the model such that it fits in our training pipeline. We will add the following functionalities:

  • IO methods: saving and loading
  • Method to access gradients during training which will be used for a monitoring "brainwaves"
  • Method that counts the number of model parameters

The benefit of adding these methods becomes clear when thinking of changing the architecture used without intending to change the IO interface of your code.

In [9]:
#%% general IO
AugmentME.augment_IO(model,savekey='save',loadkey='load',mode='torch')

#%% checkpoint saving
AugmentME.augment_checkpoint(model,key='save_checkpoint',mode='torch')

#%% gradient method
AugmentME.augment_gradient(model,key='get_gradient',mode=None)

#%% number of parameters
AugmentME.augment_Ntheta(model,key="get_Ntheta")
Out[9]:
True

As a test if the augmenting worked, we want to have a look at the number of parameters.

In [10]:
#%% number of parameters
print("Number of parameters:",model.get_Ntheta())
Number of parameters: 22447636

Training pipeline¶

Loss function¶

Before we can start training our model, we have to define a loss function. We will keep it as flexible as the model itself and use importME.

In [11]:
loss_function = importME(config["module_loss"])(**config["kwargs_loss"])

Optimizer¶

No optimization without an optimizer! Due to corresponding device issues, we have to send our model to the device before we define our optimizer.

In [12]:
#%% send model to device to avoid device errors
model.to(config["device"])
Out[12]:
DeepLabV3Plus(
  (encoder): ResNetEncoder(
    (conv1): Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer2): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer3): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (4): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (5): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), dilation=(2, 2), bias=False)
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (decoder): DeepLabV3PlusDecoder(
    (aspp): Sequential(
      (0): ASPP(
        (convs): ModuleList(
          (0): Sequential(
            (0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): ASPPSeparableConv(
            (0): SeparableConv2d(
              (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), groups=512, bias=False)
              (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            )
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (2): ASPPSeparableConv(
            (0): SeparableConv2d(
              (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(24, 24), dilation=(24, 24), groups=512, bias=False)
              (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            )
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (3): ASPPSeparableConv(
            (0): SeparableConv2d(
              (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(36, 36), dilation=(36, 36), groups=512, bias=False)
              (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            )
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (4): ASPPPooling(
            (0): AdaptiveAvgPool2d(output_size=1)
            (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (3): ReLU()
          )
        )
        (project): Sequential(
          (0): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Dropout(p=0.5, inplace=False)
        )
      )
      (1): SeparableConv2d(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
        (1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      )
      (2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU()
    )
    (up): UpsamplingBilinear2d(scale_factor=4.0, mode=bilinear)
    (block1): Sequential(
      (0): Conv2d(64, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (block2): Sequential(
      (0): SeparableConv2d(
        (0): Conv2d(304, 304, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=304, bias=False)
        (1): Conv2d(304, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      )
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
  )
  (segmentation_head): SegmentationHead(
    (0): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1))
    (1): UpsamplingBilinear2d(scale_factor=4.0, mode=bilinear)
    (2): Activation(
      (activation): Identity()
    )
  )
)

Now, we can define our optimizer with the model parameters already on our chosen device!

In [13]:
optimizer = importME(config["module_optimizer"])(model.parameters(),**config["kwargs_optimizer"])

To assess the performance of our model, we load some metric.

In [14]:
metric = importME(config["module_metric"])

Experiment logger: Tensorboard¶

Of course, we would like to track the proceeding of our training procedure. Hence, we define a tensorboard SummaryWriter.

In [15]:
writer = SummaryWriter(config["dir_tensorboard"])

The tensorboard SummaryWriter enables us to do some nice stuff. For example, adding a graph of our model.

In [16]:
writer.add_graph(model, sample_train[0].to(config["device"]))
/home/michael/anaconda3/envs/eolearn_water/lib/python3.8/site-packages/segmentation_models_pytorch/base/model.py:16: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if h % output_stride != 0 or w % output_stride != 0:

Furthermore, we would like to make our experiment reproducible. Hence, we set the seeds such that all random number generation and shuffling is done in a deterministic manner.

In [17]:
#%% reproducibility
np.random.seed(config["seed"])
torch.manual_seed(config["seed"])
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

In case of a premature exit of the training procedure, we insert a resume flag here. It enables the user to start with the chosen checkpoint or automatically choose the most recent one.

In [18]:
#%% resume flag
resume = False

#%% resume case
if resume:
    if resume==True:
        resume = os.path.join(config["dir_checkpoints"],natsort.natsorted(os.listdir(config["dir_checkpoints"]))[-1])
    else:
        resume = resume
    
    print(f'Loading Checkpoint {resume}!')
    checkpoint = torch.load(resume,map_location=config["device"])
    model.load_state_dict(checkpoint['model_state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
    loss = checkpoint['loss']
    bestloss = checkpoint['bestloss']
    bestmetric = checkpoint["bestmetric"]
    epoch_ = checkpoint['epoch']+1
    logstep_ = checkpoint['logstep']
else:
    epoch_ = 0
    logstep_ = 0
    bestloss = np.inf
    bestmetric = 0 if type(metric) is not list and type(metric) is not np.ndarray else [0 for _ in range(len(metric))]

model.train()
Out[18]:
DeepLabV3Plus(
  (encoder): ResNetEncoder(
    (conv1): Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer2): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer3): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (4): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (5): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), dilation=(2, 2), bias=False)
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (decoder): DeepLabV3PlusDecoder(
    (aspp): Sequential(
      (0): ASPP(
        (convs): ModuleList(
          (0): Sequential(
            (0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): ASPPSeparableConv(
            (0): SeparableConv2d(
              (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), groups=512, bias=False)
              (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            )
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (2): ASPPSeparableConv(
            (0): SeparableConv2d(
              (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(24, 24), dilation=(24, 24), groups=512, bias=False)
              (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            )
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (3): ASPPSeparableConv(
            (0): SeparableConv2d(
              (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(36, 36), dilation=(36, 36), groups=512, bias=False)
              (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            )
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (4): ASPPPooling(
            (0): AdaptiveAvgPool2d(output_size=1)
            (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (3): ReLU()
          )
        )
        (project): Sequential(
          (0): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Dropout(p=0.5, inplace=False)
        )
      )
      (1): SeparableConv2d(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
        (1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      )
      (2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU()
    )
    (up): UpsamplingBilinear2d(scale_factor=4.0, mode=bilinear)
    (block1): Sequential(
      (0): Conv2d(64, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (block2): Sequential(
      (0): SeparableConv2d(
        (0): Conv2d(304, 304, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=304, bias=False)
        (1): Conv2d(304, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      )
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
  )
  (segmentation_head): SegmentationHead(
    (0): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1))
    (1): UpsamplingBilinear2d(scale_factor=4.0, mode=bilinear)
    (2): Activation(
      (activation): Identity()
    )
  )
)

Starting training!¶

Let's start the training loop!

In [19]:
#%% training loop
print('Start training...')
logstep = -1+logstep_
for epoch in range(config["n_epochs"]-epoch_):
    epoch = epoch+epoch_
    for step, (x, y, mask, idx) in enumerate(dataloader_train):
        print('epoch %i step %i'%(epoch,step))
        
        #%%% clean cache of GPU
        torch.cuda.empty_cache()

        #%%% compute logstep
        logstep = logstep+1

        #%%% zero gradients
        optimizer.zero_grad(set_to_none=True)
        
        #%%% determine number of minibatches
        if type(x)==list:
            batchcount = int(np.ceil(len(x[0])/config["max_batch_size"]))
        else:
            batchcount = int(np.ceil(len(x)/config["max_batch_size"]))

        out = []
        loss = 0
        #%%% minibatch-loop
        for p in range(batchcount):
            #%%%% determine indices
            lowidx = p*config["max_batch_size"]
            if p==batchcount-1:
                if type(x)==list:
                    highidx = len(x[0])
                else:
                    highidx = len(x)
            else:
                highidx = (p+1)*config["max_batch_size"]
                
            if type(x)==list:
                tmp_x = [torch.index_select(x_,dim=0,index=torch.arange(lowidx,highidx)).detach() for x_ in x]
            else:
                tmp_x = torch.index_select(x,dim=0,index=torch.arange(lowidx,highidx)).detach()
            
            tmp_y = torch.index_select(y,dim=0,index=torch.arange(lowidx,highidx)).detach()
            tmp_mask = torch.index_select(mask,dim=0,index=torch.arange(lowidx,highidx)).detach()
        
            #%%%% forward pass
            if type(tmp_x)==list:
                tmp_out = model.forward([item_.to(config["device"]) for item_ in tmp_x])
            else:
                tmp_out = model.forward(tmp_x.to(config["device"]))

            #%%%% compute loss
            tmp_loss = loss_function(tmp_out.softmax(1),tmp_y.squeeze(1).to(config["device"]))
            tmp_loss = (tmp_loss*tmp_mask.long().squeeze(1).to(config["device"])).sum() / (torch.count_nonzero(tmp_mask.long().to(config["device"])))

            #%%%% compute gradient
            tmp_loss.backward()
            
            #%%%% collect minibatch output
            out.append(tmp_out.detach().cpu())
            loss = loss+torch.count_nonzero(tmp_mask.long().detach().cpu())/torch.count_nonzero(mask.long().detach().cpu())*tmp_loss.detach().cpu()

            #%%%% free space # keep?
            del(tmp_x)
            del(tmp_y)
            del(tmp_mask)
            del(tmp_loss)
            del(tmp_out)

        #%%% update model parameters
        optimizer.step()
        
        #%%% compute metric
        out = torch.concat(out,dim=0)
        if type(metric)==list:
            train_acc = [metric_(out,y.cpu().detach(),mask.cpu().detach()) for metric_ in metric]
        else:
            train_acc = metric(out,y.cpu().detach(),mask.cpu().detach())

        #%%% printing stuff
        print(
            "[{}] Training Step: {:d}/{:d} {:d}.{:d}, \tbatch_size: {} \tLoss: {:.4f} \tAcc: {}".format(
                dt.datetime.now().strftime("%Y-%m-%dT%H-%M-%S"),
                logstep+1,
                len(dataloader_train)*config["n_epochs"],
                epoch,
                step,
                config["batch_size"],
                loss.mean(),
                {metric_.__name__:train_acc_ for metric_,train_acc_ in zip(metric,train_acc)} if type(metric)==list else train_acc
            )
        )

        #%%% write to tensorboard
        #%%%% log loss
        writer.add_scalar(f'LossTraining/{type(loss_function).__name__}', loss, global_step=logstep)
        
        #%%%% log metric
        if type(metric)==list:
            writer.add_scalars('AccuracyTraining',{metric_.__name__:train_acc_ for metric_,train_acc_ in zip(metric,train_acc)},global_step=logstep)
        else:
            writer.add_scalar('AccuracyTraining', train_acc, global_step=logstep)
        
        #%%%% gradients
        writer.add_histogram('GradientsTraining/AllParams', model.get_gradient(mode='vec',index=None), global_step=logstep, bins=50, walltime=None, max_bins=100)
        for name,grad in model.get_gradient(mode='named params',device="cpu",detach=True):
            writer.add_histogram(f'NamedGradientsTraining/{name}', grad, global_step=logstep, bins=50, walltime=None, max_bins=100)
        
    #%%% intermediate evaluation of validation set
    if config["eval_freq"] and (epoch+1)%config["eval_freq"]==0:
        print()
        model.eval()
        loss_val = []
        acc_val = []
        weights_val = []
        with torch.no_grad():
            fig, axis = plt.subplots(nrows=len(dataloader_validation)*2, ncols=dataloader_validation.batch_size, figsize=(3*dataloader_validation.batch_size,2*3*len(dataloader_validation)))
            fig.suptitle('Validation Data %i'%logstep)                
            for step_validation, (x_validation, y_validation, mask_validation, idx_validation) in enumerate(dataloader_validation):
                print('validation step %i'%(step_validation))

                #%%%% clean cache of GPU
                torch.cuda.empty_cache()

                #%%%% forward pass
                if type(x)==list:
                    out_validation = model.forward([item_.to(config["device"]) for item_ in x_validation])
                else:
                    out_validation = model.forward(x_validation.to(config["device"]))

                #%%%% compute loss
                loss_validation = loss_function(out_validation.softmax(1),y_validation.squeeze(1).to(config["device"]))
                loss_validation = (loss_validation*mask_validation.long().squeeze(1).to(config["device"])).sum() / (torch.count_nonzero(mask_validation.long().to(config["device"])))

                #%%%% compute metric
                if type(metric)==list:
                    validation_acc = [metric_(out_validation.cpu().detach(),y_validation.cpu().detach(),mask_validation.cpu().detach()) for metric_ in metric]
                else:
                    validation_acc = metric(out_validation.cpu().detach(),y_validation.cpu().detach(),mask_validation.cpu().detach())

                #%%%% printing stuff
                print(
                    "[{}] Validation Step: {:d}/{:d}, \tbatch_size: {} \tLoss: {:.4f} \tAcc: {}".format(
                        dt.datetime.now().strftime("%Y-%m-%dT%H-%M-%S"),
                        step_validation+1,
                        len(dataloader_validation),
                        dataloader_validation.batch_size,
                        loss_validation.mean(),
                        {metric_.__name__:validation_acc_ for metric_,validation_acc_ in zip(metric,validation_acc)} if type(metric)==list else validation_acc
                    )
                )
                
                #%%%% collect predictions
                predictions_validation = torch.argmax(out_validation,1).cpu().detach().numpy()
                
                axis[step_validation*2][0].set_ylabel("Prediction")
                axis[step_validation*2+1][0].set_ylabel("Reference")
                for i in range(dataloader_validation.batch_size):
                    axis[step_validation*2][i].imshow(predictions_validation[i].squeeze(),cmap=config["cmap_reference"],vmin=0,vmax=config["num_classes"])
                    axis[step_validation*2][i].set_xticks([])
                    axis[step_validation*2][i].set_yticks([])
                    
                    axis[step_validation*2+1][i].imshow(y_validation.cpu().detach().numpy()[i].squeeze(),cmap=config["cmap_reference"],vmin=0,vmax=config["num_classes"])
                    axis[step_validation*2+1][i].set_xticks([])
                    axis[step_validation*2+1][i].set_yticks([])
                    
                #%%%% collect loss and accuracy
                loss_val.append(loss_validation.cpu().detach().numpy())
                acc_val.append(validation_acc)
                weights_val.append(torch.count_nonzero(mask_validation).cpu().detach().numpy())

            #%%%% total loss and accuracy
            total = np.sum([np.sum(weight_) for weight_ in weights_val])
            loss_val_total = np.sum([weight_/total*loss_ for weight_,loss_ in zip(weights_val,loss_val)])
            if type(metric)==list:
                acc_val_total = [np.sum([weight_/total*acc_[i] for weight_,acc_ in zip(weights_val,acc_val)]) for i in range(len(metric))]
            else:
                acc_val_total = np.sum([weight_/total*acc_ for weight_,acc_ in zip(weights_val,acc_val)])
            
            # print total values
            print(
                "[{}] Validation: \tTotal Loss: {:.4f} \tTotal Acc: {}".format(
                    dt.datetime.now().strftime("%Y-%m-%dT%H-%M-%S"),
                    loss_val_total,
                    {metric_.__name__:validation_acc_ for metric_,validation_acc_ in zip(metric,acc_val_total)} if type(metric)==list else acc_val_total
                )
            )

            #%%%% write to tensorboard
            #%%%%% log loss
            writer.add_scalar(f'LossValidation/{type(loss_function).__name__}', loss_val_total, global_step=logstep)

            #%%%%% log metric
            if type(metric)==list:
                writer.add_scalars('AccuracyValidation',{metric_.__name__:validation_acc_ for metric_,validation_acc_ in zip(metric,acc_val_total)},global_step=logstep)
            else:
                writer.add_scalar('AccuracyValidation', acc_val_total, global_step=logstep)
            
            #%%%%% log figure
            plt.tight_layout()
            plt.savefig(fname=os.path.join(config["dir_imgs_validation"],"PredictionValidation_%i"%logstep), dpi="figure")
            writer.add_figure(tag="PredictionValidation", figure=fig, global_step=logstep, close=True, walltime=None)

        model.train()
        print()
        
        #%%% checkpoint for best validation loss
        if config["checkpoint_bestloss"] and bestloss>loss_val_total:
            bestloss = loss_val_total
            print("New best validation loss! Storing checkpoint and model!")
            model.save_checkpoint(
                savename = os.path.join(config["dir_checkpoints"],'checkpoint_bestloss.tar'),
                epoch = epoch,
                logstep = logstep,
                optimizer_state_dict = optimizer.state_dict(),
                loss = loss,
                bestloss = bestloss,
                bestmetric = acc_val_total # reasonable if someone would like to restart training from that checkpoint
            )
            model.save(savename=os.path.join(config["dir_results"],config["model_savename_inference_bestloss"]),mode='inference')
            model.save(savename=os.path.join(config["dir_results"],config["model_savename_bestloss"]),mode='entirely')
            
        #%%% checkpoint for best validation metric(s)
        if config["checkpoint_bestmetric"]:
            if type(metric)==list:
                for m_, (metric_,validation_acc_) in enumerate(zip(metric,acc_val_total)):
                    if bestmetric[m_]<validation_acc_:
                        bestmetric[m_] = validation_acc_
                        print(f"New best validation metric {metric_.__name__}! Storing checkpoint and model!")
                        model.save_checkpoint(
                            savename = os.path.join(config["dir_checkpoints"],f'checkpoint_bestmetric_{metric_.__name__}.tar'),
                            epoch = epoch,
                            logstep = logstep,
                            optimizer_state_dict = optimizer.state_dict(),
                            loss = loss,
                            bestloss = loss_val_total, # reasonable if someone would like to restart training from that checkpoint
                            bestmetric = bestmetric
                        )
                        model.save(savename=os.path.join(config["dir_results"],config["model_savename_inference_bestmetric"]+f"_{metric_.__name__}"),mode='inference')
                        model.save(savename=os.path.join(config["dir_results"],config["model_savename_bestmetric"]+f"_{metric_.__name__}"),mode='entirely')
            else:
                if bestmetric<validation_acc_:
                    bestmetric = acc_val_total
                    print(f"New best validation metric! Storing checkpoint and model!")
                    model.save_checkpoint(
                        savename = os.path.join(config["dir_checkpoints"],'checkpoint_bestmetric.tar'),
                        epoch = epoch,
                        logstep = logstep,
                        optimizer_state_dict = optimizer.state_dict(),
                        loss = loss,
                        bestloss = loss_val_total, # reasonable if someone would like to restart training from that checkpoint
                        bestmetric = bestmetric
                    )
                    model.save(savename=os.path.join(config["dir_results"],config["model_savename_inference_bestmetric"]),mode='inference')
                    model.save(savename=os.path.join(config["dir_results"],config["model_savename_bestmetric"]),mode='entirely')

    #%%% checkpoint
    if config["checkpoint_freq"] and (epoch+1)%config["checkpoint_freq"]==0:
        model.save_checkpoint(
            savename = os.path.join(config["dir_checkpoints"],f'checkpoint_{logstep}_{epoch}_{step}.tar'),
            epoch = epoch,
            logstep = logstep,
            optimizer_state_dict = optimizer.state_dict(),
            loss = loss,
            bestloss = loss_val_total
        )

#%% save model
print('saving final checkpoint!')
model.save_checkpoint(savename=os.path.join(config["dir_checkpoints"],f'checkpoint_{logstep}_{epoch}_{step}.tar'), epoch=epoch, logstep=logstep, optimizer_state_dict=optimizer.state_dict(), loss=loss)
print('saving inference model')
model.save(savename=os.path.join(config["dir_results"],config["model_savename_inference"]),mode='inference')
print('saving entire model')
model.save(savename=os.path.join(config["dir_results"],config["model_savename"]),mode='entirely')
Start training...
epoch 0 step 0
[2023-02-13T15-29-50] Training Step: 1/192 0.0, 	batch_size: 12 	Loss: 1.3909 	Acc: {'accuracy': tensor(0.2314), 'cohen_kappa': -0.05557622815866159}
epoch 0 step 1
[2023-02-13T15-29-52] Training Step: 2/192 0.1, 	batch_size: 12 	Loss: 1.2519 	Acc: {'accuracy': tensor(0.5437), 'cohen_kappa': 0.2546997744812055}
epoch 0 step 2
[2023-02-13T15-29-54] Training Step: 3/192 0.2, 	batch_size: 12 	Loss: 1.3069 	Acc: {'accuracy': tensor(0.4159), 'cohen_kappa': 0.08775329156226808}
epoch 1 step 0
[2023-02-13T15-29-56] Training Step: 4/192 1.0, 	batch_size: 12 	Loss: 1.2904 	Acc: {'accuracy': tensor(0.4400), 'cohen_kappa': 0.12432958128214222}
epoch 1 step 1
[2023-02-13T15-29-58] Training Step: 5/192 1.1, 	batch_size: 12 	Loss: 1.1497 	Acc: {'accuracy': tensor(0.5859), 'cohen_kappa': 0.3175880981314283}
epoch 1 step 2
[2023-02-13T15-29-59] Training Step: 6/192 1.2, 	batch_size: 12 	Loss: 1.0248 	Acc: {'accuracy': tensor(0.7305), 'cohen_kappa': 0.48029103065277545}

validation step 0
[2023-02-13T15-30-01] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.1770 	Acc: {'accuracy': tensor(0.5667), 'cohen_kappa': 0.07761842285558429}
validation step 1
[2023-02-13T15-30-01] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.6286 	Acc: {'accuracy': tensor(0.1150), 'cohen_kappa': 0.030238133005970225}
[2023-02-13T15-30-01] Validation: 	Total Loss: 1.3882 	Total Acc: {'accuracy': 0.35546315, 'cohen_kappa': 0.05546294690056283}

New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 2 step 0
[2023-02-13T15-30-05] Training Step: 7/192 2.0, 	batch_size: 12 	Loss: 1.2078 	Acc: {'accuracy': tensor(0.5255), 'cohen_kappa': 0.26827887659157257}
epoch 2 step 1
[2023-02-13T15-30-07] Training Step: 8/192 2.1, 	batch_size: 12 	Loss: 1.0725 	Acc: {'accuracy': tensor(0.6606), 'cohen_kappa': 0.4207766294795455}
epoch 2 step 2
[2023-02-13T15-30-09] Training Step: 9/192 2.2, 	batch_size: 12 	Loss: 1.0924 	Acc: {'accuracy': tensor(0.6567), 'cohen_kappa': 0.38959210215618445}
epoch 3 step 0
[2023-02-13T15-30-11] Training Step: 10/192 3.0, 	batch_size: 12 	Loss: 1.0235 	Acc: {'accuracy': tensor(0.7188), 'cohen_kappa': 0.5100715106159788}
epoch 3 step 1
[2023-02-13T15-30-12] Training Step: 11/192 3.1, 	batch_size: 12 	Loss: 1.0576 	Acc: {'accuracy': tensor(0.6665), 'cohen_kappa': 0.43052683306991246}
epoch 3 step 2
[2023-02-13T15-30-14] Training Step: 12/192 3.2, 	batch_size: 12 	Loss: 1.1070 	Acc: {'accuracy': tensor(0.6053), 'cohen_kappa': 0.3970551077101635}

validation step 0
[2023-02-13T15-30-15] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.1561 	Acc: {'accuracy': tensor(0.5861), 'cohen_kappa': 0.1837432202180206}
validation step 1
[2023-02-13T15-30-16] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.4820 	Acc: {'accuracy': tensor(0.2266), 'cohen_kappa': 0.09576103632142141}
[2023-02-13T15-30-16] Validation: 	Total Loss: 1.3085 	Total Acc: {'accuracy': 0.4180367, 'cohen_kappa': 0.1426019109103221}

New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 4 step 0
[2023-02-13T15-30-20] Training Step: 13/192 4.0, 	batch_size: 12 	Loss: 1.0958 	Acc: {'accuracy': tensor(0.6251), 'cohen_kappa': 0.42262457553045585}
epoch 4 step 1
[2023-02-13T15-30-21] Training Step: 14/192 4.1, 	batch_size: 12 	Loss: 1.0808 	Acc: {'accuracy': tensor(0.6715), 'cohen_kappa': 0.4779777283634722}
epoch 4 step 2
[2023-02-13T15-30-23] Training Step: 15/192 4.2, 	batch_size: 12 	Loss: 0.9874 	Acc: {'accuracy': tensor(0.7719), 'cohen_kappa': 0.6071755504406604}
epoch 5 step 0
[2023-02-13T15-30-25] Training Step: 16/192 5.0, 	batch_size: 12 	Loss: 1.0864 	Acc: {'accuracy': tensor(0.6887), 'cohen_kappa': 0.4960389821937736}
epoch 5 step 1
[2023-02-13T15-30-26] Training Step: 17/192 5.1, 	batch_size: 12 	Loss: 1.0566 	Acc: {'accuracy': tensor(0.7050), 'cohen_kappa': 0.5257463438889118}
epoch 5 step 2
[2023-02-13T15-30-28] Training Step: 18/192 5.2, 	batch_size: 12 	Loss: 1.0404 	Acc: {'accuracy': tensor(0.7006), 'cohen_kappa': 0.5064178498015568}

validation step 0
[2023-02-13T15-30-29] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.1088 	Acc: {'accuracy': tensor(0.6374), 'cohen_kappa': 0.39409090465180574}
validation step 1
[2023-02-13T15-30-29] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.1728 	Acc: {'accuracy': tensor(0.5797), 'cohen_kappa': 0.372917123934162}
[2023-02-13T15-30-29] Validation: 	Total Loss: 1.1387 	Total Acc: {'accuracy': 0.6103928, 'cohen_kappa': 0.3841898426056335}

New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 6 step 0
[2023-02-13T15-30-34] Training Step: 19/192 6.0, 	batch_size: 12 	Loss: 1.0333 	Acc: {'accuracy': tensor(0.7082), 'cohen_kappa': 0.5439953877877608}
epoch 6 step 1
[2023-02-13T15-30-35] Training Step: 20/192 6.1, 	batch_size: 12 	Loss: 0.9699 	Acc: {'accuracy': tensor(0.7846), 'cohen_kappa': 0.6457816099441178}
epoch 6 step 2
[2023-02-13T15-30-37] Training Step: 21/192 6.2, 	batch_size: 12 	Loss: 1.0638 	Acc: {'accuracy': tensor(0.6825), 'cohen_kappa': 0.4938928627031214}
epoch 7 step 0
[2023-02-13T15-30-39] Training Step: 22/192 7.0, 	batch_size: 12 	Loss: 0.9985 	Acc: {'accuracy': tensor(0.7467), 'cohen_kappa': 0.5796006905976263}
epoch 7 step 1
[2023-02-13T15-30-40] Training Step: 23/192 7.1, 	batch_size: 12 	Loss: 1.0097 	Acc: {'accuracy': tensor(0.7341), 'cohen_kappa': 0.5894811382773071}
epoch 7 step 2
[2023-02-13T15-30-42] Training Step: 24/192 7.2, 	batch_size: 12 	Loss: 0.9810 	Acc: {'accuracy': tensor(0.7683), 'cohen_kappa': 0.6354116920304537}

validation step 0
[2023-02-13T15-30-44] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.2740 	Acc: {'accuracy': tensor(0.4519), 'cohen_kappa': 0.14831940364377505}
validation step 1
[2023-02-13T15-30-44] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.0452 	Acc: {'accuracy': tensor(0.6973), 'cohen_kappa': 0.4798744850683122}
[2023-02-13T15-30-44] Validation: 	Total Loss: 1.1670 	Total Acc: {'accuracy': 0.5666326, 'cohen_kappa': 0.30335772564606966}

saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 8 step 0
[2023-02-13T15-30-46] Training Step: 25/192 8.0, 	batch_size: 12 	Loss: 1.0354 	Acc: {'accuracy': tensor(0.7091), 'cohen_kappa': 0.5514917751169328}
epoch 8 step 1
[2023-02-13T15-30-47] Training Step: 26/192 8.1, 	batch_size: 12 	Loss: 0.9314 	Acc: {'accuracy': tensor(0.8112), 'cohen_kappa': 0.6780455239697434}
epoch 8 step 2
[2023-02-13T15-30-49] Training Step: 27/192 8.2, 	batch_size: 12 	Loss: 1.0057 	Acc: {'accuracy': tensor(0.7384), 'cohen_kappa': 0.5964450949672223}
epoch 9 step 0
[2023-02-13T15-30-51] Training Step: 28/192 9.0, 	batch_size: 12 	Loss: 1.0137 	Acc: {'accuracy': tensor(0.7309), 'cohen_kappa': 0.5903812894133914}
epoch 9 step 1
[2023-02-13T15-30-52] Training Step: 29/192 9.1, 	batch_size: 12 	Loss: 0.9804 	Acc: {'accuracy': tensor(0.7617), 'cohen_kappa': 0.6213388239819887}
epoch 9 step 2
[2023-02-13T15-30-54] Training Step: 30/192 9.2, 	batch_size: 12 	Loss: 0.9575 	Acc: {'accuracy': tensor(0.7906), 'cohen_kappa': 0.6431456284613131}

validation step 0
[2023-02-13T15-30-55] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.0775 	Acc: {'accuracy': tensor(0.6493), 'cohen_kappa': 0.4464086271346316}
validation step 1
[2023-02-13T15-30-55] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9908 	Acc: {'accuracy': tensor(0.7531), 'cohen_kappa': 0.5805579069628093}
[2023-02-13T15-30-55] Validation: 	Total Loss: 1.0370 	Total Acc: {'accuracy': 0.697852, 'cohen_kappa': 0.5091381113368234}

New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 10 step 0
[2023-02-13T15-30-59] Training Step: 31/192 10.0, 	batch_size: 12 	Loss: 1.0150 	Acc: {'accuracy': tensor(0.7323), 'cohen_kappa': 0.596329779144826}
epoch 10 step 1
[2023-02-13T15-31-01] Training Step: 32/192 10.1, 	batch_size: 12 	Loss: 0.9456 	Acc: {'accuracy': tensor(0.7996), 'cohen_kappa': 0.6690378217850028}
epoch 10 step 2
[2023-02-13T15-31-02] Training Step: 33/192 10.2, 	batch_size: 12 	Loss: 0.9621 	Acc: {'accuracy': tensor(0.7809), 'cohen_kappa': 0.6335743324243279}
epoch 11 step 0
[2023-02-13T15-31-04] Training Step: 34/192 11.0, 	batch_size: 12 	Loss: 0.9578 	Acc: {'accuracy': tensor(0.7878), 'cohen_kappa': 0.6794396182586451}
epoch 11 step 1
[2023-02-13T15-31-06] Training Step: 35/192 11.1, 	batch_size: 12 	Loss: 0.9286 	Acc: {'accuracy': tensor(0.8176), 'cohen_kappa': 0.6746903156680578}
epoch 11 step 2
[2023-02-13T15-31-08] Training Step: 36/192 11.2, 	batch_size: 12 	Loss: 0.9499 	Acc: {'accuracy': tensor(0.7900), 'cohen_kappa': 0.6704500809843459}

validation step 0
[2023-02-13T15-31-09] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.1176 	Acc: {'accuracy': tensor(0.6073), 'cohen_kappa': 0.37291330048210924}
validation step 1
[2023-02-13T15-31-09] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9379 	Acc: {'accuracy': tensor(0.8066), 'cohen_kappa': 0.627908830706918}
[2023-02-13T15-31-10] Validation: 	Total Loss: 1.0336 	Total Acc: {'accuracy': 0.7005212, 'cohen_kappa': 0.4921516452973367}

New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 12 step 0
[2023-02-13T15-31-13] Training Step: 37/192 12.0, 	batch_size: 12 	Loss: 0.9275 	Acc: {'accuracy': tensor(0.8184), 'cohen_kappa': 0.7166437784507591}
epoch 12 step 1
[2023-02-13T15-31-14] Training Step: 38/192 12.1, 	batch_size: 12 	Loss: 0.9830 	Acc: {'accuracy': tensor(0.7590), 'cohen_kappa': 0.6126842877945431}
epoch 12 step 2
[2023-02-13T15-31-16] Training Step: 39/192 12.2, 	batch_size: 12 	Loss: 0.8911 	Acc: {'accuracy': tensor(0.8532), 'cohen_kappa': 0.7614304053733756}
epoch 13 step 0
[2023-02-13T15-31-18] Training Step: 40/192 13.0, 	batch_size: 12 	Loss: 0.9095 	Acc: {'accuracy': tensor(0.8350), 'cohen_kappa': 0.7506030131123372}
epoch 13 step 1
[2023-02-13T15-31-20] Training Step: 41/192 13.1, 	batch_size: 12 	Loss: 0.9926 	Acc: {'accuracy': tensor(0.7501), 'cohen_kappa': 0.6103423382645203}
epoch 13 step 2
[2023-02-13T15-31-22] Training Step: 42/192 13.2, 	batch_size: 12 	Loss: 0.9717 	Acc: {'accuracy': tensor(0.7708), 'cohen_kappa': 0.6309537344687672}

validation step 0
[2023-02-13T15-31-23] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.0275 	Acc: {'accuracy': tensor(0.7134), 'cohen_kappa': 0.4990206300740485}
validation step 1
[2023-02-13T15-31-23] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.1336 	Acc: {'accuracy': tensor(0.6037), 'cohen_kappa': 0.2560782122857662}
[2023-02-13T15-31-24] Validation: 	Total Loss: 1.0771 	Total Acc: {'accuracy': 0.6620851, 'cohen_kappa': 0.3854184357259929}

epoch 14 step 0
[2023-02-13T15-31-25] Training Step: 43/192 14.0, 	batch_size: 12 	Loss: 0.9303 	Acc: {'accuracy': tensor(0.8137), 'cohen_kappa': 0.7002342094296158}
epoch 14 step 1
[2023-02-13T15-31-27] Training Step: 44/192 14.1, 	batch_size: 12 	Loss: 0.9284 	Acc: {'accuracy': tensor(0.8178), 'cohen_kappa': 0.6749811491756527}
epoch 14 step 2
[2023-02-13T15-31-28] Training Step: 45/192 14.2, 	batch_size: 12 	Loss: 0.9804 	Acc: {'accuracy': tensor(0.7600), 'cohen_kappa': 0.6390093483194795}
epoch 15 step 0
[2023-02-13T15-31-31] Training Step: 46/192 15.0, 	batch_size: 12 	Loss: 0.9810 	Acc: {'accuracy': tensor(0.7608), 'cohen_kappa': 0.6011259018941029}
epoch 15 step 1
[2023-02-13T15-31-32] Training Step: 47/192 15.1, 	batch_size: 12 	Loss: 0.9969 	Acc: {'accuracy': tensor(0.7458), 'cohen_kappa': 0.5971148325880589}
epoch 15 step 2
[2023-02-13T15-31-34] Training Step: 48/192 15.2, 	batch_size: 12 	Loss: 0.9775 	Acc: {'accuracy': tensor(0.7634), 'cohen_kappa': 0.64644965800368}

validation step 0
[2023-02-13T15-31-35] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.9258 	Acc: {'accuracy': tensor(0.8217), 'cohen_kappa': 0.6933596697471072}
validation step 1
[2023-02-13T15-31-35] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.0344 	Acc: {'accuracy': tensor(0.7032), 'cohen_kappa': 0.3967198615896762}
[2023-02-13T15-31-35] Validation: 	Total Loss: 0.9766 	Total Acc: {'accuracy': 0.76629746, 'cohen_kappa': 0.5546480629208421}

New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 16 step 0
[2023-02-13T15-31-40] Training Step: 49/192 16.0, 	batch_size: 12 	Loss: 0.9072 	Acc: {'accuracy': tensor(0.8365), 'cohen_kappa': 0.7411572348413954}
epoch 16 step 1
[2023-02-13T15-31-42] Training Step: 50/192 16.1, 	batch_size: 12 	Loss: 0.9061 	Acc: {'accuracy': tensor(0.8384), 'cohen_kappa': 0.7408152261223787}
epoch 16 step 2
[2023-02-13T15-31-43] Training Step: 51/192 16.2, 	batch_size: 12 	Loss: 0.9671 	Acc: {'accuracy': tensor(0.7763), 'cohen_kappa': 0.6324787150886226}
epoch 17 step 0
[2023-02-13T15-31-46] Training Step: 52/192 17.0, 	batch_size: 12 	Loss: 0.9449 	Acc: {'accuracy': tensor(0.7960), 'cohen_kappa': 0.6890508872487974}
epoch 17 step 1
[2023-02-13T15-31-48] Training Step: 53/192 17.1, 	batch_size: 12 	Loss: 0.9047 	Acc: {'accuracy': tensor(0.8385), 'cohen_kappa': 0.7474719785004946}
epoch 17 step 2
[2023-02-13T15-31-49] Training Step: 54/192 17.2, 	batch_size: 12 	Loss: 0.9182 	Acc: {'accuracy': tensor(0.8299), 'cohen_kappa': 0.7080361702570148}

validation step 0
[2023-02-13T15-31-51] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.9965 	Acc: {'accuracy': tensor(0.7502), 'cohen_kappa': 0.5853288541168569}
validation step 1
[2023-02-13T15-31-51] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9727 	Acc: {'accuracy': tensor(0.7666), 'cohen_kappa': 0.5230621528249373}
[2023-02-13T15-31-51] Validation: 	Total Loss: 0.9854 	Total Acc: {'accuracy': 0.75782984, 'cohen_kappa': 0.5562123500251693}

New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 18 step 0
[2023-02-13T15-31-54] Training Step: 55/192 18.0, 	batch_size: 12 	Loss: 0.9477 	Acc: {'accuracy': tensor(0.8014), 'cohen_kappa': 0.6619511495632975}
epoch 18 step 1
[2023-02-13T15-31-55] Training Step: 56/192 18.1, 	batch_size: 12 	Loss: 0.9407 	Acc: {'accuracy': tensor(0.8045), 'cohen_kappa': 0.7045306709947103}
epoch 18 step 2
[2023-02-13T15-31-57] Training Step: 57/192 18.2, 	batch_size: 12 	Loss: 0.9467 	Acc: {'accuracy': tensor(0.7983), 'cohen_kappa': 0.6778089945406558}
epoch 19 step 0
[2023-02-13T15-31-59] Training Step: 58/192 19.0, 	batch_size: 12 	Loss: 0.9264 	Acc: {'accuracy': tensor(0.8173), 'cohen_kappa': 0.7264700560914203}
epoch 19 step 1
[2023-02-13T15-32-01] Training Step: 59/192 19.1, 	batch_size: 12 	Loss: 0.8957 	Acc: {'accuracy': tensor(0.8464), 'cohen_kappa': 0.7211029659367661}
epoch 19 step 2
[2023-02-13T15-32-03] Training Step: 60/192 19.2, 	batch_size: 12 	Loss: 0.9155 	Acc: {'accuracy': tensor(0.8271), 'cohen_kappa': 0.7264010708306672}

validation step 0
[2023-02-13T15-32-04] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.9064 	Acc: {'accuracy': tensor(0.8483), 'cohen_kappa': 0.7476466026801813}
validation step 1
[2023-02-13T15-32-04] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.0271 	Acc: {'accuracy': tensor(0.7138), 'cohen_kappa': 0.502610188849713}
[2023-02-13T15-32-04] Validation: 	Total Loss: 0.9628 	Total Acc: {'accuracy': 0.7854378, 'cohen_kappa': 0.6330652357878078}

New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 20 step 0
[2023-02-13T15-32-09] Training Step: 61/192 20.0, 	batch_size: 12 	Loss: 0.9451 	Acc: {'accuracy': tensor(0.7965), 'cohen_kappa': 0.6884061726464226}
epoch 20 step 1
[2023-02-13T15-32-11] Training Step: 62/192 20.1, 	batch_size: 12 	Loss: 0.8826 	Acc: {'accuracy': tensor(0.8611), 'cohen_kappa': 0.758370248132815}
epoch 20 step 2
[2023-02-13T15-32-13] Training Step: 63/192 20.2, 	batch_size: 12 	Loss: 0.9447 	Acc: {'accuracy': tensor(0.7961), 'cohen_kappa': 0.686921438372986}
epoch 21 step 0
[2023-02-13T15-32-15] Training Step: 64/192 21.0, 	batch_size: 12 	Loss: 0.9008 	Acc: {'accuracy': tensor(0.8409), 'cohen_kappa': 0.7508842547707282}
epoch 21 step 1
[2023-02-13T15-32-17] Training Step: 65/192 21.1, 	batch_size: 12 	Loss: 0.9449 	Acc: {'accuracy': tensor(0.7965), 'cohen_kappa': 0.6659262541326325}
epoch 21 step 2
[2023-02-13T15-32-19] Training Step: 66/192 21.2, 	batch_size: 12 	Loss: 0.9399 	Acc: {'accuracy': tensor(0.8010), 'cohen_kappa': 0.6977305712237294}

validation step 0
[2023-02-13T15-32-20] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.8512 	Acc: {'accuracy': tensor(0.8932), 'cohen_kappa': 0.8129253912230852}
validation step 1
[2023-02-13T15-32-20] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9580 	Acc: {'accuracy': tensor(0.7838), 'cohen_kappa': 0.5623318643272017}
[2023-02-13T15-32-20] Validation: 	Total Loss: 0.9012 	Total Acc: {'accuracy': 0.8420067, 'cohen_kappa': 0.695745465197981}

New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 22 step 0
[2023-02-13T15-32-25] Training Step: 67/192 22.0, 	batch_size: 12 	Loss: 0.9054 	Acc: {'accuracy': tensor(0.8369), 'cohen_kappa': 0.6924020016666204}
epoch 22 step 1
[2023-02-13T15-32-26] Training Step: 68/192 22.1, 	batch_size: 12 	Loss: 0.9208 	Acc: {'accuracy': tensor(0.8210), 'cohen_kappa': 0.7214446435444388}
epoch 22 step 2
[2023-02-13T15-32-28] Training Step: 69/192 22.2, 	batch_size: 12 	Loss: 0.9207 	Acc: {'accuracy': tensor(0.8223), 'cohen_kappa': 0.7345901652190887}
epoch 23 step 0
[2023-02-13T15-32-30] Training Step: 70/192 23.0, 	batch_size: 12 	Loss: 0.8815 	Acc: {'accuracy': tensor(0.8617), 'cohen_kappa': 0.7653637547300507}
epoch 23 step 1
[2023-02-13T15-32-32] Training Step: 71/192 23.1, 	batch_size: 12 	Loss: 0.9162 	Acc: {'accuracy': tensor(0.8262), 'cohen_kappa': 0.7284774865260117}
epoch 23 step 2
[2023-02-13T15-32-34] Training Step: 72/192 23.2, 	batch_size: 12 	Loss: 0.9277 	Acc: {'accuracy': tensor(0.8161), 'cohen_kappa': 0.7196271393800199}

validation step 0
[2023-02-13T15-32-35] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.0437 	Acc: {'accuracy': tensor(0.6922), 'cohen_kappa': 0.5401980982032146}
validation step 1
[2023-02-13T15-32-35] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9584 	Acc: {'accuracy': tensor(0.7784), 'cohen_kappa': 0.5731854695310097}
[2023-02-13T15-32-35] Validation: 	Total Loss: 1.0038 	Total Acc: {'accuracy': 0.7324784, 'cohen_kappa': 0.5556233080234175}

saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 24 step 0
[2023-02-13T15-32-37] Training Step: 73/192 24.0, 	batch_size: 12 	Loss: 0.9113 	Acc: {'accuracy': tensor(0.8306), 'cohen_kappa': 0.7258560516736612}
epoch 24 step 1
[2023-02-13T15-32-39] Training Step: 74/192 24.1, 	batch_size: 12 	Loss: 0.9076 	Acc: {'accuracy': tensor(0.8359), 'cohen_kappa': 0.7451256367584911}
epoch 24 step 2
[2023-02-13T15-32-41] Training Step: 75/192 24.2, 	batch_size: 12 	Loss: 0.9105 	Acc: {'accuracy': tensor(0.8322), 'cohen_kappa': 0.7330999673571206}
epoch 25 step 0
[2023-02-13T15-32-43] Training Step: 76/192 25.0, 	batch_size: 12 	Loss: 0.9072 	Acc: {'accuracy': tensor(0.8354), 'cohen_kappa': 0.7411250430168612}
epoch 25 step 1
[2023-02-13T15-32-45] Training Step: 77/192 25.1, 	batch_size: 12 	Loss: 0.8860 	Acc: {'accuracy': tensor(0.8598), 'cohen_kappa': 0.7832929728974216}
epoch 25 step 2
[2023-02-13T15-32-46] Training Step: 78/192 25.2, 	batch_size: 12 	Loss: 0.9418 	Acc: {'accuracy': tensor(0.7981), 'cohen_kappa': 0.6709183786183947}

validation step 0
[2023-02-13T15-32-48] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.8908 	Acc: {'accuracy': tensor(0.8615), 'cohen_kappa': 0.76976191160364}
validation step 1
[2023-02-13T15-32-48] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9800 	Acc: {'accuracy': tensor(0.7633), 'cohen_kappa': 0.5884229627519881}
[2023-02-13T15-32-48] Validation: 	Total Loss: 0.9325 	Total Acc: {'accuracy': 0.81558216, 'cohen_kappa': 0.6849660875649155}

epoch 26 step 0
[2023-02-13T15-32-50] Training Step: 79/192 26.0, 	batch_size: 12 	Loss: 0.9134 	Acc: {'accuracy': tensor(0.8309), 'cohen_kappa': 0.6976810192285832}
epoch 26 step 1
[2023-02-13T15-32-51] Training Step: 80/192 26.1, 	batch_size: 12 	Loss: 0.9072 	Acc: {'accuracy': tensor(0.8352), 'cohen_kappa': 0.7490830466060858}
epoch 26 step 2
[2023-02-13T15-32-53] Training Step: 81/192 26.2, 	batch_size: 12 	Loss: 0.9343 	Acc: {'accuracy': tensor(0.8086), 'cohen_kappa': 0.700969530020435}
epoch 27 step 0
[2023-02-13T15-32-55] Training Step: 82/192 27.0, 	batch_size: 12 	Loss: 0.9017 	Acc: {'accuracy': tensor(0.8407), 'cohen_kappa': 0.732908722088}
epoch 27 step 1
[2023-02-13T15-32-56] Training Step: 83/192 27.1, 	batch_size: 12 	Loss: 0.9279 	Acc: {'accuracy': tensor(0.8142), 'cohen_kappa': 0.6916977958206463}
epoch 27 step 2
[2023-02-13T15-32-58] Training Step: 84/192 27.2, 	batch_size: 12 	Loss: 0.9111 	Acc: {'accuracy': tensor(0.8303), 'cohen_kappa': 0.7361887330104406}

validation step 0
[2023-02-13T15-32-59] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.9558 	Acc: {'accuracy': tensor(0.7890), 'cohen_kappa': 0.6476109860558075}
validation step 1
[2023-02-13T15-33-00] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9472 	Acc: {'accuracy': tensor(0.7940), 'cohen_kappa': 0.5993394109905691}
[2023-02-13T15-33-00] Validation: 	Total Loss: 0.9518 	Total Acc: {'accuracy': 0.7913485, 'cohen_kappa': 0.625038736623323}

epoch 28 step 0
[2023-02-13T15-33-01] Training Step: 85/192 28.0, 	batch_size: 12 	Loss: 0.9064 	Acc: {'accuracy': tensor(0.8359), 'cohen_kappa': 0.7375208938637605}
epoch 28 step 1
[2023-02-13T15-33-03] Training Step: 86/192 28.1, 	batch_size: 12 	Loss: 0.8919 	Acc: {'accuracy': tensor(0.8511), 'cohen_kappa': 0.7650360091370365}
epoch 28 step 2
[2023-02-13T15-33-05] Training Step: 87/192 28.2, 	batch_size: 12 	Loss: 0.9126 	Acc: {'accuracy': tensor(0.8306), 'cohen_kappa': 0.7373209726110648}
epoch 29 step 0
[2023-02-13T15-33-07] Training Step: 88/192 29.0, 	batch_size: 12 	Loss: 0.9338 	Acc: {'accuracy': tensor(0.8071), 'cohen_kappa': 0.7067882347720389}
epoch 29 step 1
[2023-02-13T15-33-08] Training Step: 89/192 29.1, 	batch_size: 12 	Loss: 0.8941 	Acc: {'accuracy': tensor(0.8497), 'cohen_kappa': 0.7426306433516969}
epoch 29 step 2
[2023-02-13T15-33-10] Training Step: 90/192 29.2, 	batch_size: 12 	Loss: 0.9107 	Acc: {'accuracy': tensor(0.8319), 'cohen_kappa': 0.7357869798525072}

validation step 0
[2023-02-13T15-33-11] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.0531 	Acc: {'accuracy': tensor(0.6812), 'cohen_kappa': 0.5007139319055034}
validation step 1
[2023-02-13T15-33-11] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9937 	Acc: {'accuracy': tensor(0.7462), 'cohen_kappa': 0.45572799104324213}
[2023-02-13T15-33-12] Validation: 	Total Loss: 1.0253 	Total Acc: {'accuracy': 0.7116252, 'cohen_kappa': 0.4796780763999473}

epoch 30 step 0
[2023-02-13T15-33-13] Training Step: 91/192 30.0, 	batch_size: 12 	Loss: 0.9445 	Acc: {'accuracy': tensor(0.7981), 'cohen_kappa': 0.6956995680521212}
epoch 30 step 1
[2023-02-13T15-33-15] Training Step: 92/192 30.1, 	batch_size: 12 	Loss: 0.9210 	Acc: {'accuracy': tensor(0.8225), 'cohen_kappa': 0.693554272574971}
epoch 30 step 2
[2023-02-13T15-33-17] Training Step: 93/192 30.2, 	batch_size: 12 	Loss: 0.9031 	Acc: {'accuracy': tensor(0.8390), 'cohen_kappa': 0.7265383835484511}
epoch 31 step 0
[2023-02-13T15-33-19] Training Step: 94/192 31.0, 	batch_size: 12 	Loss: 0.8890 	Acc: {'accuracy': tensor(0.8536), 'cohen_kappa': 0.7424369072693489}
epoch 31 step 1
[2023-02-13T15-33-21] Training Step: 95/192 31.1, 	batch_size: 12 	Loss: 0.8885 	Acc: {'accuracy': tensor(0.8541), 'cohen_kappa': 0.7707602873667729}
epoch 31 step 2
[2023-02-13T15-33-22] Training Step: 96/192 31.2, 	batch_size: 12 	Loss: 0.8931 	Acc: {'accuracy': tensor(0.8507), 'cohen_kappa': 0.774084828391289}

validation step 0
[2023-02-13T15-33-24] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.0192 	Acc: {'accuracy': tensor(0.7248), 'cohen_kappa': 0.5193695863699821}
validation step 1
[2023-02-13T15-33-24] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.0946 	Acc: {'accuracy': tensor(0.6460), 'cohen_kappa': 0.18919767477285143}
[2023-02-13T15-33-24] Validation: 	Total Loss: 1.0545 	Total Acc: {'accuracy': 0.6879435, 'cohen_kappa': 0.3649780477864906}

saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 32 step 0
[2023-02-13T15-33-26] Training Step: 97/192 32.0, 	batch_size: 12 	Loss: 0.8860 	Acc: {'accuracy': tensor(0.8569), 'cohen_kappa': 0.7750248957732473}
epoch 32 step 1
[2023-02-13T15-33-28] Training Step: 98/192 32.1, 	batch_size: 12 	Loss: 0.8952 	Acc: {'accuracy': tensor(0.8474), 'cohen_kappa': 0.7659463058108047}
epoch 32 step 2
[2023-02-13T15-33-29] Training Step: 99/192 32.2, 	batch_size: 12 	Loss: 0.8883 	Acc: {'accuracy': tensor(0.8565), 'cohen_kappa': 0.7678961603625809}
epoch 33 step 0
[2023-02-13T15-33-32] Training Step: 100/192 33.0, 	batch_size: 12 	Loss: 0.8824 	Acc: {'accuracy': tensor(0.8597), 'cohen_kappa': 0.7752160878085741}
epoch 33 step 1
[2023-02-13T15-33-34] Training Step: 101/192 33.1, 	batch_size: 12 	Loss: 0.8910 	Acc: {'accuracy': tensor(0.8523), 'cohen_kappa': 0.7718504891955391}
epoch 33 step 2
[2023-02-13T15-33-36] Training Step: 102/192 33.2, 	batch_size: 12 	Loss: 0.8860 	Acc: {'accuracy': tensor(0.8590), 'cohen_kappa': 0.7670584029364274}

validation step 0
[2023-02-13T15-33-37] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.9607 	Acc: {'accuracy': tensor(0.7787), 'cohen_kappa': 0.6529883030801146}
validation step 1
[2023-02-13T15-33-37] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9406 	Acc: {'accuracy': tensor(0.8008), 'cohen_kappa': 0.611371459470869}
[2023-02-13T15-33-37] Validation: 	Total Loss: 0.9513 	Total Acc: {'accuracy': 0.7890247, 'cohen_kappa': 0.6335278696206507}

epoch 34 step 0
[2023-02-13T15-33-39] Training Step: 103/192 34.0, 	batch_size: 12 	Loss: 0.9018 	Acc: {'accuracy': tensor(0.8421), 'cohen_kappa': 0.7560377629704754}
epoch 34 step 1
[2023-02-13T15-33-41] Training Step: 104/192 34.1, 	batch_size: 12 	Loss: 0.8629 	Acc: {'accuracy': tensor(0.8789), 'cohen_kappa': 0.7865822668886376}
epoch 34 step 2
[2023-02-13T15-33-42] Training Step: 105/192 34.2, 	batch_size: 12 	Loss: 0.8916 	Acc: {'accuracy': tensor(0.8503), 'cohen_kappa': 0.7714599988740917}
epoch 35 step 0
[2023-02-13T15-33-44] Training Step: 106/192 35.0, 	batch_size: 12 	Loss: 0.8873 	Acc: {'accuracy': tensor(0.8552), 'cohen_kappa': 0.7722990233218369}
epoch 35 step 1
[2023-02-13T15-33-46] Training Step: 107/192 35.1, 	batch_size: 12 	Loss: 0.8835 	Acc: {'accuracy': tensor(0.8590), 'cohen_kappa': 0.7836080905996652}
epoch 35 step 2
[2023-02-13T15-33-48] Training Step: 108/192 35.2, 	batch_size: 12 	Loss: 0.9251 	Acc: {'accuracy': tensor(0.8186), 'cohen_kappa': 0.711766301636827}

validation step 0
[2023-02-13T15-33-50] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.0035 	Acc: {'accuracy': tensor(0.7276), 'cohen_kappa': 0.5365724577805897}
validation step 1
[2023-02-13T15-33-50] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.0449 	Acc: {'accuracy': tensor(0.6905), 'cohen_kappa': 0.30756499239970947}
[2023-02-13T15-33-50] Validation: 	Total Loss: 1.0229 	Total Acc: {'accuracy': 0.71026003, 'cohen_kappa': 0.42948638023925884}

epoch 36 step 0
[2023-02-13T15-33-51] Training Step: 109/192 36.0, 	batch_size: 12 	Loss: 0.9108 	Acc: {'accuracy': tensor(0.8335), 'cohen_kappa': 0.7143996699382182}
epoch 36 step 1
[2023-02-13T15-33-53] Training Step: 110/192 36.1, 	batch_size: 12 	Loss: 0.8524 	Acc: {'accuracy': tensor(0.8917), 'cohen_kappa': 0.8236461660906257}
epoch 36 step 2
[2023-02-13T15-33-55] Training Step: 111/192 36.2, 	batch_size: 12 	Loss: 0.9022 	Acc: {'accuracy': tensor(0.8407), 'cohen_kappa': 0.7625193022972058}
epoch 37 step 0
[2023-02-13T15-33-57] Training Step: 112/192 37.0, 	batch_size: 12 	Loss: 0.8693 	Acc: {'accuracy': tensor(0.8730), 'cohen_kappa': 0.7984373620851993}
epoch 37 step 1
[2023-02-13T15-33-58] Training Step: 113/192 37.1, 	batch_size: 12 	Loss: 0.9493 	Acc: {'accuracy': tensor(0.7913), 'cohen_kappa': 0.6620778111068322}
epoch 37 step 2
[2023-02-13T15-34-00] Training Step: 114/192 37.2, 	batch_size: 12 	Loss: 0.9051 	Acc: {'accuracy': tensor(0.8407), 'cohen_kappa': 0.751104493099796}

validation step 0
[2023-02-13T15-34-01] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.0933 	Acc: {'accuracy': tensor(0.6189), 'cohen_kappa': 0.3754395555655672}
validation step 1
[2023-02-13T15-34-02] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.0295 	Acc: {'accuracy': tensor(0.7030), 'cohen_kappa': 0.3528113216969123}
[2023-02-13T15-34-02] Validation: 	Total Loss: 1.0635 	Total Acc: {'accuracy': 0.6581998, 'cohen_kappa': 0.3648583773378095}

epoch 38 step 0
[2023-02-13T15-34-03] Training Step: 115/192 38.0, 	batch_size: 12 	Loss: 0.9273 	Acc: {'accuracy': tensor(0.8152), 'cohen_kappa': 0.7143851608275451}
epoch 38 step 1
[2023-02-13T15-34-05] Training Step: 116/192 38.1, 	batch_size: 12 	Loss: 0.8897 	Acc: {'accuracy': tensor(0.8537), 'cohen_kappa': 0.7603776511432832}
epoch 38 step 2
[2023-02-13T15-34-06] Training Step: 117/192 38.2, 	batch_size: 12 	Loss: 0.8907 	Acc: {'accuracy': tensor(0.8512), 'cohen_kappa': 0.7731028702869078}
epoch 39 step 0
[2023-02-13T15-34-08] Training Step: 118/192 39.0, 	batch_size: 12 	Loss: 0.8700 	Acc: {'accuracy': tensor(0.8723), 'cohen_kappa': 0.7862627600912704}
epoch 39 step 1
[2023-02-13T15-34-10] Training Step: 119/192 39.1, 	batch_size: 12 	Loss: 0.8714 	Acc: {'accuracy': tensor(0.8707), 'cohen_kappa': 0.779709779822463}
epoch 39 step 2
[2023-02-13T15-34-11] Training Step: 120/192 39.2, 	batch_size: 12 	Loss: 0.9338 	Acc: {'accuracy': tensor(0.8073), 'cohen_kappa': 0.7127169828948599}

validation step 0
[2023-02-13T15-34-13] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.8695 	Acc: {'accuracy': tensor(0.8738), 'cohen_kappa': 0.7877554263907438}
validation step 1
[2023-02-13T15-34-13] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9693 	Acc: {'accuracy': tensor(0.7716), 'cohen_kappa': 0.5944937673787141}
[2023-02-13T15-34-13] Validation: 	Total Loss: 0.9162 	Total Acc: {'accuracy': 0.82602197, 'cohen_kappa': 0.6973844292343501}

New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 40 step 0
[2023-02-13T15-34-16] Training Step: 121/192 40.0, 	batch_size: 12 	Loss: 0.8768 	Acc: {'accuracy': tensor(0.8661), 'cohen_kappa': 0.7977954325209495}
epoch 40 step 1
[2023-02-13T15-34-18] Training Step: 122/192 40.1, 	batch_size: 12 	Loss: 0.8621 	Acc: {'accuracy': tensor(0.8808), 'cohen_kappa': 0.8077586552149943}
epoch 40 step 2
[2023-02-13T15-34-20] Training Step: 123/192 40.2, 	batch_size: 12 	Loss: 0.8908 	Acc: {'accuracy': tensor(0.8513), 'cohen_kappa': 0.7577873258955465}
epoch 41 step 0
[2023-02-13T15-34-22] Training Step: 124/192 41.0, 	batch_size: 12 	Loss: 0.9027 	Acc: {'accuracy': tensor(0.8397), 'cohen_kappa': 0.7608068245950764}
epoch 41 step 1
[2023-02-13T15-34-24] Training Step: 125/192 41.1, 	batch_size: 12 	Loss: 0.9035 	Acc: {'accuracy': tensor(0.8406), 'cohen_kappa': 0.729719682281992}
epoch 41 step 2
[2023-02-13T15-34-26] Training Step: 126/192 41.2, 	batch_size: 12 	Loss: 0.8698 	Acc: {'accuracy': tensor(0.8738), 'cohen_kappa': 0.7908415931430759}

validation step 0
[2023-02-13T15-34-27] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.8748 	Acc: {'accuracy': tensor(0.8692), 'cohen_kappa': 0.7769642285955033}
validation step 1
[2023-02-13T15-34-27] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9476 	Acc: {'accuracy': tensor(0.7931), 'cohen_kappa': 0.6003806067229731}
[2023-02-13T15-34-28] Validation: 	Total Loss: 0.9088 	Total Acc: {'accuracy': 0.83360857, 'cohen_kappa': 0.6943920408595241}

epoch 42 step 0
[2023-02-13T15-34-29] Training Step: 127/192 42.0, 	batch_size: 12 	Loss: 0.8837 	Acc: {'accuracy': tensor(0.8583), 'cohen_kappa': 0.7662791148103731}
epoch 42 step 1
[2023-02-13T15-34-31] Training Step: 128/192 42.1, 	batch_size: 12 	Loss: 0.8785 	Acc: {'accuracy': tensor(0.8626), 'cohen_kappa': 0.7794225728681328}
epoch 42 step 2
[2023-02-13T15-34-33] Training Step: 129/192 42.2, 	batch_size: 12 	Loss: 0.9231 	Acc: {'accuracy': tensor(0.8189), 'cohen_kappa': 0.7212535679174785}
epoch 43 step 0
[2023-02-13T15-34-35] Training Step: 130/192 43.0, 	batch_size: 12 	Loss: 0.8819 	Acc: {'accuracy': tensor(0.8601), 'cohen_kappa': 0.7605760051182822}
epoch 43 step 1
[2023-02-13T15-34-37] Training Step: 131/192 43.1, 	batch_size: 12 	Loss: 0.9123 	Acc: {'accuracy': tensor(0.8306), 'cohen_kappa': 0.7426007639020975}
epoch 43 step 2
[2023-02-13T15-34-39] Training Step: 132/192 43.2, 	batch_size: 12 	Loss: 0.8940 	Acc: {'accuracy': tensor(0.8478), 'cohen_kappa': 0.767346267780769}

validation step 0
[2023-02-13T15-34-41] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.8268 	Acc: {'accuracy': tensor(0.9158), 'cohen_kappa': 0.8527111233761231}
validation step 1
[2023-02-13T15-34-41] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9458 	Acc: {'accuracy': tensor(0.7939), 'cohen_kappa': 0.5875196672609027}
[2023-02-13T15-34-41] Validation: 	Total Loss: 0.8824 	Total Acc: {'accuracy': 0.858815, 'cohen_kappa': 0.7287050662564503}

New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 44 step 0
[2023-02-13T15-34-45] Training Step: 133/192 44.0, 	batch_size: 12 	Loss: 0.8786 	Acc: {'accuracy': tensor(0.8639), 'cohen_kappa': 0.7941555302863819}
epoch 44 step 1
[2023-02-13T15-34-47] Training Step: 134/192 44.1, 	batch_size: 12 	Loss: 0.9001 	Acc: {'accuracy': tensor(0.8429), 'cohen_kappa': 0.7465306699202661}
epoch 44 step 2
[2023-02-13T15-34-48] Training Step: 135/192 44.2, 	batch_size: 12 	Loss: 0.8749 	Acc: {'accuracy': tensor(0.8671), 'cohen_kappa': 0.7718372683147114}
epoch 45 step 0
[2023-02-13T15-34-51] Training Step: 136/192 45.0, 	batch_size: 12 	Loss: 0.9159 	Acc: {'accuracy': tensor(0.8248), 'cohen_kappa': 0.7325705799877329}
epoch 45 step 1
[2023-02-13T15-34-52] Training Step: 137/192 45.1, 	batch_size: 12 	Loss: 0.8859 	Acc: {'accuracy': tensor(0.8570), 'cohen_kappa': 0.752529734330956}
epoch 45 step 2
[2023-02-13T15-34-54] Training Step: 138/192 45.2, 	batch_size: 12 	Loss: 0.8527 	Acc: {'accuracy': tensor(0.8904), 'cohen_kappa': 0.8230836404047003}

validation step 0
[2023-02-13T15-34-55] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.0047 	Acc: {'accuracy': tensor(0.7587), 'cohen_kappa': 0.5955601680925017}
validation step 1
[2023-02-13T15-34-56] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9947 	Acc: {'accuracy': tensor(0.7452), 'cohen_kappa': 0.4434123842807809}
[2023-02-13T15-34-56] Validation: 	Total Loss: 1.0001 	Total Acc: {'accuracy': 0.75236475, 'cohen_kappa': 0.5244144115648464}

epoch 46 step 0
[2023-02-13T15-34-57] Training Step: 139/192 46.0, 	batch_size: 12 	Loss: 0.9026 	Acc: {'accuracy': tensor(0.8409), 'cohen_kappa': 0.7551895802651266}
epoch 46 step 1
[2023-02-13T15-34-59] Training Step: 140/192 46.1, 	batch_size: 12 	Loss: 0.8764 	Acc: {'accuracy': tensor(0.8650), 'cohen_kappa': 0.7838902443764181}
epoch 46 step 2
[2023-02-13T15-35-00] Training Step: 141/192 46.2, 	batch_size: 12 	Loss: 0.8826 	Acc: {'accuracy': tensor(0.8599), 'cohen_kappa': 0.7807473826717046}
epoch 47 step 0
[2023-02-13T15-35-03] Training Step: 142/192 47.0, 	batch_size: 12 	Loss: 0.9044 	Acc: {'accuracy': tensor(0.8397), 'cohen_kappa': 0.7383520624233151}
epoch 47 step 1
[2023-02-13T15-35-04] Training Step: 143/192 47.1, 	batch_size: 12 	Loss: 0.8909 	Acc: {'accuracy': tensor(0.8530), 'cohen_kappa': 0.7766442827616414}
epoch 47 step 2
[2023-02-13T15-35-06] Training Step: 144/192 47.2, 	batch_size: 12 	Loss: 0.8926 	Acc: {'accuracy': tensor(0.8495), 'cohen_kappa': 0.7551329272668602}

validation step 0
[2023-02-13T15-35-07] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.8839 	Acc: {'accuracy': tensor(0.8577), 'cohen_kappa': 0.7622122320447309}
validation step 1
[2023-02-13T15-35-08] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9491 	Acc: {'accuracy': tensor(0.7909), 'cohen_kappa': 0.5860742081024162}
[2023-02-13T15-35-08] Validation: 	Total Loss: 0.9144 	Total Acc: {'accuracy': 0.8264491, 'cohen_kappa': 0.6798484101558123}

saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 48 step 0
[2023-02-13T15-35-10] Training Step: 145/192 48.0, 	batch_size: 12 	Loss: 0.8765 	Acc: {'accuracy': tensor(0.8655), 'cohen_kappa': 0.7837166168258277}
epoch 48 step 1
[2023-02-13T15-35-11] Training Step: 146/192 48.1, 	batch_size: 12 	Loss: 0.8838 	Acc: {'accuracy': tensor(0.8583), 'cohen_kappa': 0.7819388447096857}
epoch 48 step 2
[2023-02-13T15-35-13] Training Step: 147/192 48.2, 	batch_size: 12 	Loss: 0.8550 	Acc: {'accuracy': tensor(0.8882), 'cohen_kappa': 0.8214588002457232}
epoch 49 step 0
[2023-02-13T15-35-15] Training Step: 148/192 49.0, 	batch_size: 12 	Loss: 0.8899 	Acc: {'accuracy': tensor(0.8524), 'cohen_kappa': 0.7721353962862763}
epoch 49 step 1
[2023-02-13T15-35-16] Training Step: 149/192 49.1, 	batch_size: 12 	Loss: 0.8831 	Acc: {'accuracy': tensor(0.8606), 'cohen_kappa': 0.7541379782904627}
epoch 49 step 2
[2023-02-13T15-35-18] Training Step: 150/192 49.2, 	batch_size: 12 	Loss: 0.8699 	Acc: {'accuracy': tensor(0.8729), 'cohen_kappa': 0.8069924373796857}

validation step 0
[2023-02-13T15-35-20] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.9680 	Acc: {'accuracy': tensor(0.7779), 'cohen_kappa': 0.6510732434906241}
validation step 1
[2023-02-13T15-35-20] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9496 	Acc: {'accuracy': tensor(0.7902), 'cohen_kappa': 0.5951325316097944}
[2023-02-13T15-35-20] Validation: 	Total Loss: 0.9594 	Total Acc: {'accuracy': 0.7836454, 'cohen_kappa': 0.6249148324526933}

epoch 50 step 0
[2023-02-13T15-35-22] Training Step: 151/192 50.0, 	batch_size: 12 	Loss: 0.8429 	Acc: {'accuracy': tensor(0.9003), 'cohen_kappa': 0.846054296680194}
epoch 50 step 1
[2023-02-13T15-35-23] Training Step: 152/192 50.1, 	batch_size: 12 	Loss: 0.8932 	Acc: {'accuracy': tensor(0.8493), 'cohen_kappa': 0.7606021740591531}
epoch 50 step 2
[2023-02-13T15-35-25] Training Step: 153/192 50.2, 	batch_size: 12 	Loss: 0.8840 	Acc: {'accuracy': tensor(0.8583), 'cohen_kappa': 0.762017046041138}
epoch 51 step 0
[2023-02-13T15-35-27] Training Step: 154/192 51.0, 	batch_size: 12 	Loss: 0.8556 	Acc: {'accuracy': tensor(0.8873), 'cohen_kappa': 0.8224252509709693}
epoch 51 step 1
[2023-02-13T15-35-29] Training Step: 155/192 51.1, 	batch_size: 12 	Loss: 0.8735 	Acc: {'accuracy': tensor(0.8682), 'cohen_kappa': 0.7962875640871826}
epoch 51 step 2
[2023-02-13T15-35-30] Training Step: 156/192 51.2, 	batch_size: 12 	Loss: 0.8672 	Acc: {'accuracy': tensor(0.8758), 'cohen_kappa': 0.7973093801376726}

validation step 0
[2023-02-13T15-35-32] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.9794 	Acc: {'accuracy': tensor(0.7546), 'cohen_kappa': 0.575120201410892}
validation step 1
[2023-02-13T15-35-32] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.0485 	Acc: {'accuracy': tensor(0.6862), 'cohen_kappa': 0.30063642132551704}
[2023-02-13T15-35-32] Validation: 	Total Loss: 1.0117 	Total Acc: {'accuracy': 0.72262514, 'cohen_kappa': 0.44676896487293194}

epoch 52 step 0
[2023-02-13T15-35-34] Training Step: 157/192 52.0, 	batch_size: 12 	Loss: 0.8805 	Acc: {'accuracy': tensor(0.8666), 'cohen_kappa': 0.7758320755032168}
epoch 52 step 1
[2023-02-13T15-35-36] Training Step: 158/192 52.1, 	batch_size: 12 	Loss: 0.8375 	Acc: {'accuracy': tensor(0.9052), 'cohen_kappa': 0.8501358718271644}
epoch 52 step 2
[2023-02-13T15-35-37] Training Step: 159/192 52.2, 	batch_size: 12 	Loss: 0.8923 	Acc: {'accuracy': tensor(0.8501), 'cohen_kappa': 0.7689108408884857}
epoch 53 step 0
[2023-02-13T15-35-39] Training Step: 160/192 53.0, 	batch_size: 12 	Loss: 0.8939 	Acc: {'accuracy': tensor(0.8516), 'cohen_kappa': 0.7697062683022701}
epoch 53 step 1
[2023-02-13T15-35-41] Training Step: 161/192 53.1, 	batch_size: 12 	Loss: 0.8743 	Acc: {'accuracy': tensor(0.8678), 'cohen_kappa': 0.7974120909307408}
epoch 53 step 2
[2023-02-13T15-35-42] Training Step: 162/192 53.2, 	batch_size: 12 	Loss: 0.8500 	Acc: {'accuracy': tensor(0.8928), 'cohen_kappa': 0.8180821910266342}

validation step 0
[2023-02-13T15-35-44] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.9246 	Acc: {'accuracy': tensor(0.8238), 'cohen_kappa': 0.7092738284286644}
validation step 1
[2023-02-13T15-35-44] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9587 	Acc: {'accuracy': tensor(0.7825), 'cohen_kappa': 0.5499626942061473}
[2023-02-13T15-35-44] Validation: 	Total Loss: 0.9406 	Total Acc: {'accuracy': 0.8044964, 'cohen_kappa': 0.63477842084806}

epoch 54 step 0
[2023-02-13T15-35-46] Training Step: 163/192 54.0, 	batch_size: 12 	Loss: 0.8678 	Acc: {'accuracy': tensor(0.8742), 'cohen_kappa': 0.7979842448046471}
epoch 54 step 1
[2023-02-13T15-35-47] Training Step: 164/192 54.1, 	batch_size: 12 	Loss: 0.8547 	Acc: {'accuracy': tensor(0.8898), 'cohen_kappa': 0.8335461403333035}
epoch 54 step 2
[2023-02-13T15-35-49] Training Step: 165/192 54.2, 	batch_size: 12 	Loss: 0.8854 	Acc: {'accuracy': tensor(0.8572), 'cohen_kappa': 0.7593215894813435}
epoch 55 step 0
[2023-02-13T15-35-51] Training Step: 166/192 55.0, 	batch_size: 12 	Loss: 0.8874 	Acc: {'accuracy': tensor(0.8562), 'cohen_kappa': 0.7723535081903738}
epoch 55 step 1
[2023-02-13T15-35-52] Training Step: 167/192 55.1, 	batch_size: 12 	Loss: 0.8683 	Acc: {'accuracy': tensor(0.8745), 'cohen_kappa': 0.7938709453507239}
epoch 55 step 2
[2023-02-13T15-35-54] Training Step: 168/192 55.2, 	batch_size: 12 	Loss: 0.8976 	Acc: {'accuracy': tensor(0.8451), 'cohen_kappa': 0.7540273620293882}

validation step 0
[2023-02-13T15-35-56] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.9432 	Acc: {'accuracy': tensor(0.7957), 'cohen_kappa': 0.6732921400265585}
validation step 1
[2023-02-13T15-35-56] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9425 	Acc: {'accuracy': tensor(0.7983), 'cohen_kappa': 0.605914024397048}
[2023-02-13T15-35-56] Validation: 	Total Loss: 0.9429 	Total Acc: {'accuracy': 0.7969221, 'cohen_kappa': 0.6417854897831254}

saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 56 step 0
[2023-02-13T15-35-58] Training Step: 169/192 56.0, 	batch_size: 12 	Loss: 0.8547 	Acc: {'accuracy': tensor(0.8878), 'cohen_kappa': 0.8230902366093013}
epoch 56 step 1
[2023-02-13T15-35-59] Training Step: 170/192 56.1, 	batch_size: 12 	Loss: 0.8772 	Acc: {'accuracy': tensor(0.8649), 'cohen_kappa': 0.786548842697036}
epoch 56 step 2
[2023-02-13T15-36-01] Training Step: 171/192 56.2, 	batch_size: 12 	Loss: 0.8509 	Acc: {'accuracy': tensor(0.8913), 'cohen_kappa': 0.830077996007112}
epoch 57 step 0
[2023-02-13T15-36-03] Training Step: 172/192 57.0, 	batch_size: 12 	Loss: 0.8654 	Acc: {'accuracy': tensor(0.8769), 'cohen_kappa': 0.8009668754259991}
epoch 57 step 1
[2023-02-13T15-36-05] Training Step: 173/192 57.1, 	batch_size: 12 	Loss: 0.8785 	Acc: {'accuracy': tensor(0.8634), 'cohen_kappa': 0.77522015666332}
epoch 57 step 2
[2023-02-13T15-36-07] Training Step: 174/192 57.2, 	batch_size: 12 	Loss: 0.8602 	Acc: {'accuracy': tensor(0.8817), 'cohen_kappa': 0.8217523507343993}

validation step 0
[2023-02-13T15-36-08] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.8616 	Acc: {'accuracy': tensor(0.8823), 'cohen_kappa': 0.7992412217989857}
validation step 1
[2023-02-13T15-36-08] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9498 	Acc: {'accuracy': tensor(0.7905), 'cohen_kappa': 0.564126666085013}
[2023-02-13T15-36-09] Validation: 	Total Loss: 0.9028 	Total Acc: {'accuracy': 0.8393599, 'cohen_kappa': 0.6892994105029768}

epoch 58 step 0
[2023-02-13T15-36-10] Training Step: 175/192 58.0, 	batch_size: 12 	Loss: 0.8533 	Acc: {'accuracy': tensor(0.8897), 'cohen_kappa': 0.833603056308877}
epoch 58 step 1
[2023-02-13T15-36-12] Training Step: 176/192 58.1, 	batch_size: 12 	Loss: 0.8648 	Acc: {'accuracy': tensor(0.8780), 'cohen_kappa': 0.7985609395052748}
epoch 58 step 2
[2023-02-13T15-36-14] Training Step: 177/192 58.2, 	batch_size: 12 	Loss: 0.8808 	Acc: {'accuracy': tensor(0.8610), 'cohen_kappa': 0.7714629032613715}
epoch 59 step 0
[2023-02-13T15-36-16] Training Step: 178/192 59.0, 	batch_size: 12 	Loss: 0.8555 	Acc: {'accuracy': tensor(0.8882), 'cohen_kappa': 0.8184691485459297}
epoch 59 step 1
[2023-02-13T15-36-18] Training Step: 179/192 59.1, 	batch_size: 12 	Loss: 0.8860 	Acc: {'accuracy': tensor(0.8563), 'cohen_kappa': 0.7682692899886663}
epoch 59 step 2
[2023-02-13T15-36-19] Training Step: 180/192 59.2, 	batch_size: 12 	Loss: 0.8721 	Acc: {'accuracy': tensor(0.8706), 'cohen_kappa': 0.8013438561506172}

validation step 0
[2023-02-13T15-36-21] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.8891 	Acc: {'accuracy': tensor(0.8545), 'cohen_kappa': 0.7604057187398828}
validation step 1
[2023-02-13T15-36-21] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9392 	Acc: {'accuracy': tensor(0.8025), 'cohen_kappa': 0.6049966526187271}
[2023-02-13T15-36-21] Validation: 	Total Loss: 0.9126 	Total Acc: {'accuracy': 0.8301872, 'cohen_kappa': 0.6877349554632488}

epoch 60 step 0
[2023-02-13T15-36-22] Training Step: 181/192 60.0, 	batch_size: 12 	Loss: 0.8743 	Acc: {'accuracy': tensor(0.8678), 'cohen_kappa': 0.7713323152300866}
epoch 60 step 1
[2023-02-13T15-36-24] Training Step: 182/192 60.1, 	batch_size: 12 	Loss: 0.8789 	Acc: {'accuracy': tensor(0.8642), 'cohen_kappa': 0.7921916282357739}
epoch 60 step 2
[2023-02-13T15-36-26] Training Step: 183/192 60.2, 	batch_size: 12 	Loss: 0.8409 	Acc: {'accuracy': tensor(0.9014), 'cohen_kappa': 0.8403233046193458}
epoch 61 step 0
[2023-02-13T15-36-28] Training Step: 184/192 61.0, 	batch_size: 12 	Loss: 0.8806 	Acc: {'accuracy': tensor(0.8610), 'cohen_kappa': 0.7756397329802833}
epoch 61 step 1
[2023-02-13T15-36-29] Training Step: 185/192 61.1, 	batch_size: 12 	Loss: 0.8299 	Acc: {'accuracy': tensor(0.9133), 'cohen_kappa': 0.8577537559934313}
epoch 61 step 2
[2023-02-13T15-36-31] Training Step: 186/192 61.2, 	batch_size: 12 	Loss: 0.8698 	Acc: {'accuracy': tensor(0.8722), 'cohen_kappa': 0.8040348443916918}

validation step 0
[2023-02-13T15-36-33] Validation Step: 1/2, 	batch_size: 4 	Loss: 0.9866 	Acc: {'accuracy': tensor(0.7532), 'cohen_kappa': 0.585845921612177}
validation step 1
[2023-02-13T15-36-33] Validation Step: 2/2, 	batch_size: 4 	Loss: 1.0064 	Acc: {'accuracy': tensor(0.7260), 'cohen_kappa': 0.40695890633576015}
[2023-02-13T15-36-33] Validation: 	Total Loss: 0.9958 	Total Acc: {'accuracy': 0.74047786, 'cohen_kappa': 0.5021966451260709}

epoch 62 step 0
[2023-02-13T15-36-34] Training Step: 187/192 62.0, 	batch_size: 12 	Loss: 0.8803 	Acc: {'accuracy': tensor(0.8625), 'cohen_kappa': 0.7648468944182463}
epoch 62 step 1
[2023-02-13T15-36-36] Training Step: 188/192 62.1, 	batch_size: 12 	Loss: 0.8553 	Acc: {'accuracy': tensor(0.8874), 'cohen_kappa': 0.8104378903968414}
epoch 62 step 2
[2023-02-13T15-36-37] Training Step: 189/192 62.2, 	batch_size: 12 	Loss: 0.8808 	Acc: {'accuracy': tensor(0.8607), 'cohen_kappa': 0.787797484498777}
epoch 63 step 0
[2023-02-13T15-36-40] Training Step: 190/192 63.0, 	batch_size: 12 	Loss: 0.8507 	Acc: {'accuracy': tensor(0.8917), 'cohen_kappa': 0.8286021139700592}
epoch 63 step 1
[2023-02-13T15-36-41] Training Step: 191/192 63.1, 	batch_size: 12 	Loss: 0.8727 	Acc: {'accuracy': tensor(0.8693), 'cohen_kappa': 0.7869636345193906}
epoch 63 step 2
[2023-02-13T15-36-43] Training Step: 192/192 63.2, 	batch_size: 12 	Loss: 0.8672 	Acc: {'accuracy': tensor(0.8759), 'cohen_kappa': 0.79844451165435}

validation step 0
[2023-02-13T15-36-44] Validation Step: 1/2, 	batch_size: 4 	Loss: 1.0140 	Acc: {'accuracy': tensor(0.7332), 'cohen_kappa': 0.5853699656118463}
validation step 1
[2023-02-13T15-36-44] Validation Step: 2/2, 	batch_size: 4 	Loss: 0.9351 	Acc: {'accuracy': tensor(0.8037), 'cohen_kappa': 0.6095376014186584}
[2023-02-13T15-36-44] Validation: 	Total Loss: 0.9771 	Total Acc: {'accuracy': 0.7661687, 'cohen_kappa': 0.5966709828852115}

saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saving final checkpoint!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saving inference model
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saving entire model
saveME_torch: start saving of the entire model!
saveME_torch: saved
Out[19]:
True

Testing¶

After the long training, we would like to test our model on the chosen test tiles.

We load the model providing the best validation loss using the BaseClass from AugmentME.

In [20]:
model = AugmentME.BaseClass(mode="torch")
model.load(os.path.join(config["dir_results"],config["model_savename_bestloss"]))
loadME_torch: start loading of the entire model!
loadME_torch: loaded
Out[20]:
True

Since the testing script is rather similar to the validation part of our training procedure, we do not discuss this here.

In [30]:
#%% testing loop
print('Start testing...')
model.eval()
losss_test = []
accs_test = []
weights_test = []
with torch.no_grad():              
    for step_test, (x_test, y_test, mask_test, idx_test) in enumerate(dataloader_test):
        print('Test step %i'%(step_test))

        #%%%% clean cache of GPU
        torch.cuda.empty_cache()

        #%%%% forward pass
        if type(x_test)==list:
            out_test = model.forward([item_.to(config["device"]) for item_ in x_test])
        else:
            out_test = model.forward(x_test.to(config["device"]))

        #%%%% compute loss
        loss_test = loss_function(out_test.softmax(1),y_test.squeeze(1).to(config["device"]))
        loss_test = (loss_test*mask_test.long().squeeze(1).to(config["device"])).sum() / (torch.count_nonzero(mask_test.long().to(config["device"])))

        #%%%% compute metric
        if type(metric)==list:
            test_acc = [metric_(out_test.cpu().detach(),y_test.cpu().detach(),mask_test.cpu().detach()) for metric_ in metric]
        else:
            test_acc = metric(out_test.cpu().detach(),y_test.cpu().detach(),mask_test.cpu().detach())

        #%%%% printing stuff
        print(
            "[{}] Test Step: {:d}/{:d}, \tbatch_size: {} \tLoss: {:.4f} \tAcc: {}".format(
                dt.datetime.now().strftime("%Y-%m-%dT%H-%M-%S"),
                step_test+1,
                len(dataloader_test),
                dataloader_test.batch_size,
                loss_test.mean(),
                {metric_.__name__:test_acc_ for metric_,test_acc_ in zip(metric,test_acc)} if type(metric)==list else test_acc
            )
        )

        #%%%% collect loss and accuracy
        losss_test.append(loss_test.cpu().detach().numpy())
        accs_test.append(test_acc)
        weights_test.append(torch.count_nonzero(mask_test).cpu().detach().numpy())
        
        #%%%% plot
        #%%%%% calculations for plot
        prediction_test = torch.argmax(out_test,1).cpu()
        eopatches = [EOPatch.load(dataset_test.paths[idx_.cpu()]) for idx_ in idx_test[0]]
        imgs_swir = [eopatch[(FeatureType.DATA,"data")][...,[-1,-3,-4]].squeeze() for eopatch in eopatches]
        imgs_true = [eopatch[(FeatureType.DATA,"data")][...,[0,1,2]].squeeze() for eopatch in eopatches]
        
        #%%%%% batch plot
        fig, axis = plt.subplots(nrows=4, ncols=dataloader_test.batch_size, figsize=(5*dataloader_test.batch_size,5*4))
        axis[0][0].set_ylabel("Prediction")
        axis[1][0].set_ylabel("Reference")
        axis[2][0].set_ylabel("SWIR Image")
        axis[3][0].set_ylabel("True Color Image")
        for i in range(dataloader_test.batch_size):
            axis[0][i].imshow(prediction_test[i],vmin=0,vmax=config["num_classes"],cmap=config["cmap_reference"])
            axis[0][i].set_yticks([])
            axis[0][i].set_xticks([])
            
            axis[1][i].imshow(y_test.squeeze(1)[i].cpu(),vmin=0,vmax=config["num_classes"],cmap=config["cmap_reference"])
            axis[1][i].set_yticks([])
            axis[1][i].set_xticks([])
            
            axis[2][i].imshow(imgs_swir[i]*2.5)
            axis[2][i].set_yticks([])
            axis[2][i].set_xticks([])
            
            axis[3][i].imshow(imgs_true[i]*2.5)
            axis[3][i].set_yticks([])
            axis[3][i].set_xticks([])
        plt.subplots_adjust(left=0, bottom=0.05, right=1, top=0.95, wspace=0.1, hspace=0)
        plt.show()

    #%%%% total loss and accuracy
    total = np.sum([np.sum(weight_) for weight_ in weights_test])
    loss_test_total = np.sum([weight_/total*loss_ for weight_,loss_ in zip(weights_test,losss_test)])
    if type(metric)==list:
        acc_test_total = [np.sum([weight_/total*acc_[i] for weight_,acc_ in zip(weights_test,accs_test)]) for i in range(len(metric))]
    else:
        acc_test_total = np.sum([weight_/total*acc_ for weight_,acc_ in zip(weights_test,accs_test)])

    # print total values
    print(
        "[{}] Test: \tTotal Loss: {:.4f} \tTotal Acc: {}".format(
            dt.datetime.now().strftime("%Y-%m-%dT%H-%M-%S"),
            loss_test_total,
            {metric_.__name__:test_acc_ for metric_,test_acc_ in zip(metric,acc_test_total)} if type(metric)==list else acc_test_total
        )
    )

    #%%% write to tensorboard
    #%%%% log loss
    writer.add_scalar(f'LossTest/{type(loss_function).__name__}', loss_test_total, global_step=step_test)

    #%%%% log metric
    if type(metric)==list:
        writer.add_scalars('AccuracyTest',{metric_.__name__:test_acc_ for metric_,test_acc_ in zip(metric,acc_test_total)},global_step=step_test)
    else:
        writer.add_scalar('AccuracyTest', acc_test_total, global_step=step_test)
print()
Start testing...
Test step 0
[2023-02-13T15-44-07] Test Step: 1/2, 	batch_size: 4 	Loss: 0.9858 	Acc: {'accuracy': tensor(0.7514), 'cohen_kappa': 0.5258390902529031}
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Test step 1
[2023-02-13T15-44-09] Test Step: 2/2, 	batch_size: 4 	Loss: 0.9805 	Acc: {'accuracy': tensor(0.7578), 'cohen_kappa': 0.6144480004861541}
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
[2023-02-13T15-44-11] Test: 	Total Loss: 0.9832 	Total Acc: {'accuracy': 0.754561, 'cohen_kappa': 0.5697383328759842}

Our model has been trained and tested. Hence, we free our GPU from it and its corresponding variables.

In [31]:
del(model)
del(optimizer)
del(x)
del(y)
del(mask)
del(x_validation)
del(y_validation)
del(mask_validation)
del(x_test)
del(y_test)
del(mask_test)
del(loss)
del(loss_validation)
del(loss_test)
del(grad)
torch.cuda.empty_cache()

Evaluation¶

Finally and after a long time doing training, validation and testing, we may have a look at the tensorboard. Please make sure, that the tensorboard is running!

In [32]:
notebook.list()
print('\nPlease check, if the port is correct and tensorboard is running!\n')
notebook.display(port=6006,height=1000)
Known TensorBoard instances:
  - port 6006: logdir ./Example_DeforestationDetection/DeforestationDetectionRun/results/tensorboard/ (started 0:04:57 ago; pid 117201)

Please check, if the port is correct and tensorboard is running!

Selecting TensorBoard with logdir ./Example_DeforestationDetection/DeforestationDetectionRun/results/tensorboard/ (started 0:04:57 ago; port 6006, pid 117201).
In [ ]: